On genomic repeats and reproducibility

نویسندگان

  • Can Firtina
  • Can Alkan
چکیده

RESULTS Here, we present a comprehensive analysis on the reproducibility of computational characterization of genomic variants using high throughput sequencing data. We reanalyzed the same datasets twice, using the same tools with the same parameters, where we only altered the order of reads in the input (i.e. FASTQ file). Reshuffling caused the reads from repetitive regions being mapped to different locations in the second alignment, and we observed similar results when we only applied a scatter/gather approach for read mapping-without prior shuffling. Our results show that, some of the most common variation discovery algorithms do not handle the ambiguous read mappings accurately when random locations are selected. In addition, we also observed that even when the exact same alignment is used, the GATK HaplotypeCaller generates slightly different call sets, which we pinpoint to the variant filtration step. We conclude that, algorithms at each step of genomic variation discovery and characterization need to treat ambiguous mappings in a deterministic fashion to ensure full replication of results. AVAILABILITY AND IMPLEMENTATION Code, scripts and the generated VCF files are available at DOI:10.5281/zenodo.32611. CONTACT [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A study on genetic differentiation in two species of Iranian bleaks, (Alburnus mossulensis) and (Alburnus caeruleus) (Teleostei, Cyprinidae) using simple sequence repeats

The genetic structure of the genus Alburnus is not well known and the phylogenetic relationships among its species are uncertain. In the present study, simple sequence repeats (SSRs or microsatellites) were used to evaluate genetic diversity and genetic differentiation between Alburnus mossulensis Heckel, 1843 from Kashgan River in Lorestan province and Alburnus caeruleus Heckel, 1843 from Gama...

متن کامل

Rapid and accurate determination of (CAG)n repeats in the androgen receptor gene using polymerase chain reaction and automated fragment analysis.

OBJECTIVES To develop and evaluate a new method for determination of the CAG repeat length in Exon 1 of the androgen receptor gene. DESIGN AND METHODS The method is based on PCR amplification of a DNA region encompassing the repeats and analysis of the length of the PCR product on a sequencing gel. One of the PCR primers was labeled with Cy5.5 fluorescent dye to facilitate detection after las...

متن کامل

DNA Extraction of Almond without Phenol and Liquid Nitrogen

Genomic DNA extraction with a high quantity and quality is a basic requirement in molecular biology. The DNA obtained was free of any contamination proteins, polysaccharide, polyphenols and colored pigments. These compounds would interfere with the genomic isolation procedures and downstream reactions such as restriction enzyme analysis and gene amplification. The isolated genomic DNA was fo...

متن کامل

P-119: Survey of Genetic Alterations in Exon1 of Androgen Receptor Gene in Azoospermic Patients

Background Androgen receptor (AR) mediates androgen actions such as initiation and promotion of spermatogenesis and growth of accessory sex organs. There are two trinucleotide polymorphisms (CAG and GGN repeats) in exon1 of AR gene that are vary in length in population. The CAG and GGN repeats association with infertility is still unknown and this study is planned to assess the distribution of ...

متن کامل

VNTR9 and VNTR10, two newly-found variable-number tandem repeat loci useful in MLVA genotyping of Bordetella pertussis

Background & Aims: Bordetella pertussis, the causative agent of whooping cough, continues to infect human hosts even in those populations where infants and children are routinely vaccinated. Causes of pertussis epidemiology are not fully identified unless strains of the pathogen are characterized by molecular means. Golbally, Multi Locus Variable Number of Tandem Repeats analysis (MLVA) has pro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 32 15  شماره 

صفحات  -

تاریخ انتشار 2016